Stochastic EM for Shuffled Linear Regression
نویسندگان
چکیده
We consider the problem of inference in a linear regression model in which the relative ordering of the input features and output labels is not known. Such datasets naturally arise from experiments in which the samples are shuffled or permuted during the protocol. In this work, we propose a framework that treats the unknown permutation as a latent variable. We maximize the likelihood of observations using a stochastic expectation-maximization (EM) approach. We compare this to the dominant approach in the literature, which corresponds to hard EM in our framework. We show on synthetic data that the stochastic EM algorithm we develop has several advantages, including lower parameter error, less sensitivity to the choice of initialization, and significantly better performance on datasets that are only partially shuffled. We conclude by performing two experiments on real datasets that have been partially shuffled, in which we show that the stochastic EM algorithm can recover the weights with modest error.
منابع مشابه
Liu Estimates and Influence Analysis in Regression Models with Stochastic Linear Restrictions and AR (1) Errors
In the linear regression models with AR (1) error structure when collinearity exists, stochastic linear restrictions or modifications of biased estimators (including Liu estimators) can be used to reduce the estimated variance of the regression coefficients estimates. In this paper, the combination of the biased Liu estimator and stochastic linear restrictions estimator is considered to overcom...
متن کاملDiagnostic Measures in Ridge Regression Model with AR(1) Errors under the Stochastic Linear Restrictions
Outliers and influential observations have important effects on the regression analysis. The goal of this paper is to extend the mean-shift model for detecting outliers in case of ridge regression model in the presence of stochastic linear restrictions when the error terms follow by an autoregressive AR(1) process. Furthermore, extensions of measures for diagnosing influential observations are ...
متن کاملShuffled Frog-Leaping Programming for Solving Regression Problems
There are various automatic programming models inspired by evolutionary computation techniques. Due to the importance of devising an automatic mechanism to explore the complicated search space of mathematical problems where numerical methods fails, evolutionary computations are widely studied and applied to solve real world problems. One of the famous algorithm in optimization problem is shuffl...
متن کاملLexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training
We present a new approach to stochastic modeling of constraintbased grammars that is based on loglinear models and uses EM for estimation from unannotated data. The techniques are applied to an LFG grammar for German. Evaluation on an exact match task yields 86% precision for an ambiguity rate of 5.4, and 90% precision on a subcat frame match for an ambiguity rate of 25. Experimental comparison...
متن کاملHigh-Dimensional Variance-Reduced Stochastic Gradient Expectation-Maximization Algorithm
We propose a generic stochastic expectationmaximization (EM) algorithm for the estimation of high-dimensional latent variable models. At the core of our algorithm is a novel semi-stochastic variance-reduced gradient designed for the Qfunction in the EM algorithm. Under a mild condition on the initialization, our algorithm is guaranteed to attain a linear convergence rate to the unknown paramete...
متن کامل